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Abstract. One of the brightest Gamma Ray Burst ever recorded, GRB030329, 
occurred during the second science run of the LIGO detectors. At that time, both 
interferometers at the Hanford, WA LIGO site were in lock and acquiring data. 
The data collected from the two Hanford detectors was analyzed for the presence 
of a gravitational wave signal associated with this GRB. This paper presents a 
detailed description of the search algorithm implemented in the current analysis. 
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1. Introduction 

On 29 March 2003, one of the closest and brightest ever Gamma Ray Burst, 
GRB030329 pQ, was detected by instruments aboard the HETE-2 satellite. Subsequent 
afterglow observations fixed the redshift of the host galaxy at z ~ 0.17 At the time 
when the GRB030329 was detected, the two LIGO interferometers at the Hanford site, 
were in lock and taking data. The data in the vicinity of the GRB trigger time was 
analyzed for the presence of a gravitational wave burst signal. The results from the 
analysis as well as relevant details about the trigger itself will be reported elsewhere [3] • 
In this paper we present details of the search algorithm used in this analysis. 

The distinction between a full analysis pipeline and a search algorithm should 
be emphasized. The analysis pipeline is the entire process that starts with the raw 
intcrfcrometric data output and ends with a statistical inference, such as a detection 
or no-detection, or a confidence interval estimate. 

The search algorithm is the component which acts on pre-processed data and 
produces a list of candidate events or the value of a statistic which is then used in 
drawing a statistical inference. A first version of a full analysis pipeline was reported 
in 0] and the pipeline used for the present analysis will be described in 

The paper is structured as follows. Section [21 presents the line of reasoning that 
leads to the current form of the search algorithm. Section |3] gives a brief outline of 
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the preparation of raw data before it is fed to the algorithm. Section 0] then describes 
the algorithm at a level of detail which should be sufficient to allow independent 
reproduction. Our conclusions are presented in Section [SJ 



2. Justification of the algorithm 

The majority of present day astrophysical models and theories of GRBs suggest that 
the associated gravitational wave signal is a short burst lasting for tens of milliseconds 
contemporaneous with the GRB. There are also some models which predict signals 
lasting over tens of seconds. The analysis carried out in [3j focusses on short duration 
(< 150 msec) bursts. 

At the time GRB030329 occurred, only the two co-located and co-aligned Hanford 
detectors were acquiring data. The spectral responses of these detectors are also very 
similar. Therefore, the arrival time and measured waveforms of a gravitational wave 
signal should be essentially the same in both detectors. Ergo, it is natural to consider 
cross-correlation of the two data streams as the basis of a search algorithm. This 
conclusion can also be reached via a more formal argument based on the maximum 
log-Likelihood ratio test. 

Let us use a simple, albeit unrealistic, model for detector noise to illustrate the 
rationale behind the method. Assume that the two detectors have uncorrelated noise. 
Further assume that the noise in each is zero mean, Gaussian and stationary. Without 
loss of generality, this is equivalent to the noise being white with unit variance. Let the 
GW signal waveform in the output of HI be h = {h[0], h[l], . . . , h[N — 1]}, where N is 
the length of the data segment being analyzed. The signal waveform in the output of 
H2 will be essentially the same modulo an overall amplitude factor that we will ignore 
for the present argument. 

Let Xl = {xi[0],xi[l],...,xi[A - 1]} and x 2 = {x 2 [0], x 2 [l], . . . , x 2 [N - 1]} 
denote data segments from HI and H2 respectively. For the case outlined above, 
the Logarithm of the Likelihood Ratio |S] is given by, 
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where p(xi , x 2 1 h) is the joint probability density of the data in the presence of a signal 
h. 

A(xi,X2|h) is parametrized by the signal sample values h[k]. In the absence of 
any prior knowledge about the signal waveform, we shall simply maximize A(xi, x 2 |h) 
over each h[k] independently. The maximum value, A(xi,x 2 ), of A(xi,x 2 |h) occurs 
for h = (xi + x 2 )/2, and 

^ JV-l - iV-l ^ N-l 

A(x 1; x 2 ) = ±Y1 Xl W 2 + + 2E ^iM^W ■ (2) 

i=Q i=0 i=Q 

Thus, the test statistic obtained from the Likelihood Ratio prescription involves the 
second moments of the individual data segments and their cross-correlation. 

Our simplified assumptions do not hold for real data: interferometer noise is non- 
Gaussian and non-stationary; HI and H2 are not completely uncorrelated. Among 
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sources of deviation from Gaussianity, stationarity and independence are high power 
narrowband noise (also called lines) and bursts of instrumental origins. 

Lines in the two data streams with nearby frequencies can produce large spurious 
correlations, especially in short data segments. The presence of bursts will adversely 
affect the performance of A(xi , X2) since even if such a non-GW burst were to occur in 
only one detector, A(xi, X2) may cross the detection threshold because of the variance 
terms. 

Lines are probably the easier problem to address, since most high power features 
seems to be persistent in their carrier frequencies and bandwidths. Therefore, they 
are easily identified and can be removed using notch filters (see Section |3J). Lines 
that are not present at all times pose a more complex problem but this is still 
easier to trace than instrumental bursts. Therefore, it makes sense to use only 
the cross-correlation term as the basis of a detection strategy. (A cross-correlation 
based waveform consistency test has also been independently developed for use with 
untriggered burst searches [7].) 

We know that GW signals must have about the same time of arrival in the output 
of the two Hanford detectors but we do not know a priori the delay between the GRB 
and the GW signal. Consequently, we do not know the absolute time of arrival of the 
GW signal itself. However, based on most present astrophysical models and known 
observational uncertainity in the onset of a GRB, we can make an educated guess 
about the start and stop times of the data segment to be analyzed. In our analysis, 
the start time is chosen to be 120 sec before the GRB arrival time and the stop time 
is fixed at 60 sec after it. We also verify the insensitivity of the search for 20-30% 
perturbation of these boundaries. This interval (~180s), called the on-source segment, 
is much larger than the expected duration of GW signal (~O(10-100ms)). 

The optimum length of the data segments to use for computing the cross- 
correlation depends on the duration of the signal and its signal-to-noisc ratio, both of 
which are not known a priori. The combination of these two facts suggests that the 
cross-correlation of HI and H2 outputs should be computed on segments whose start 
times and lengths vary over a range covering the allowed range of arrival time and the 
expected duration of the GW burst signal. 

An artificial time shift, much larger than the expected signal duration of ~O(10- 
100ms), between the two data streams before computing cross-correlations suppresses 
the average contribution from the GW signal. This fact can be used to make a local 
estimation of noise properties, thus mitigating the effects of non-stationarity in the 
interferometer outputs. 

The line of reasoning outlined above naturally leads to a search algorithm 0] that 
processes the data as follows. (1) A three dimensional random field is constructed, 
consisting of cross-correlations for a range of start times, integration lengths and time 
shifts. A signal must produce a local change in the distribution of samples in this 
random field. (2) The signature of the signal is extracted in a systematic manner. 
First, the three dimensional field is reduced to a two dimensional image, called a 
corrgram. The value of a pixel in the corrgram indicates the significance of cross- 
correlations near zero time shift relative to large time shift cross-correlations. (3) 
Finally, a list of events is found by identifying significant regions in the corrgram 
image, each event being characterized by a peak time, optimal integration length and 
an event strength. The strength of each event can be compared with a preset detection 
threshold or can be translated into a confidence interval/upper limit for the strength 
of the underlying GW signal. 
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The detection threshold and the confidence intervals are obtained via extensive 
Monte Carlo simulations using sections of the interferometer output, called off -source 
data, that are well outside the on-source segment. 

3. Pre-processing and data conditioning 

To mitigate the effect of instrumental artifacts and uneven spectral sensitivity, the raw 
output from the detectors needs to be processed before using the search algorithm. 
A brief description of this part of the pipeline is provided here for the sake of 
completeness. 

The output from each interferometer is broken into 330 sec long sections with a 
15 second overlap between consecutive segments, providing a continuous tiling of the 
data with 300s long segments. In order to avoid edge effects, the on-source segment 
(180 sec long), lies in the middle of one such 330 sec long segment. The pre-processing 
steps are, as follows. For each 330 sec long segment: 

(i) Time-domain, zero-phase notch filters are applied to remove a set of narrow 
frequency bands corresponding to known line features. 

(ii) A simple IIR filter is applied to H2 data with a phase response that approximates 
the difference in the phase responses of HI and H2 calibration transfer functions. 
This restores the proper phase and waveform relation between the possible 
gravitational wave signals in the two data streams. This is to compensate for 
the slightly different spectral response of the two Hanford interferometers. 

(iii) Each section is bandpass filtered using time domain zero-phase filtering. The 
lower and upper frequencies of the pass band arc 80 Hz and 2048 Hz respectively 

(iv) The bandpassed data is downsampled from 16384 Hz to 4096 Hz by dropping the 
appropriate subset of samples. 

(v) A linear predictor filter [SJ (LPF) is trained on the data between 5 and 15 seconds 
at the beginning. The LPF is a procedure for constructing a high order FIR filter 
whose transfer function is the inverse of the square root of the Power Spectral 
Density (PSD) of the data. 

(vi) The first and last 5 sec of the resulting data arc dropped in order to avoid filter 
transients. Only very short portions of first and last 10 sec are processed by the 
search algorithm, in order to avoid edge effects at the beginning and the end of 
each 300s long tile. 

The result of the pre-processing is a 320 sec segment consisting of noise with a flat 
PSD {white noise ) between ^80 Hz and ~2048 Hz. In practice, only the middle 300 
sec from each 320 sec segment are used to find events. The first and last 10 seconds 
are not used to make the final event list but are processed in order to take care of 
edge effects. 

The preprocessing and data conditioning outlined above also modifies a signal 
present in the data. However, all the filters applied to the data, except for the H2 
spectral shape correction and the LPF derived filter, are the same for both HI and 
H2 and zero-phase filtering is used for all filters, including the LPF derived ones. The 
spectral shape of HI and H2 were close enough that the LPF whitening filters did not 
have to be significantly different. This ensured that the signal start times in the pre- 
processed data had no significant shift and the signal phases and waveforms remained 
about the same in the pre-processed and conditioned data streams. 
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4. Detailed description of analysis algorithm 

We will now describe the main analysis algorithm that is applied to pairs of pre- 
processed data segments from HI and H2. The basic idea behind the algorithm has 
already been stated in Section 
First, we fix our notation. 

(i) Let the pre-processed time series from HI be Hi = {f/i[0], Hi[l], . . . H\[N — 1]} 
and from H2 be H 2 = {H 2 [0], H 2 [l], . . . H 2 [N - 1]}. 

(ii) Let si [k, j] be a subsegment from Hi , si [k, j] = {Hi [k] , Hi [k + 1] , . . . , Hi [k + j — 
1]}. Similary define S2[fc, j] be a subsegment from H2. 

(iii) The cross-correlation Ck, m .j between segments Si[fc,j] and s 2 [m, j] is defined as, 

j'-i 

C k ,m,j =J2 H ^ k + + i] . (3) 

i=0 

(Note that the ordering of the first two indices in the definition of Ck >m .j is 
important since they correspond to data from different detectors.) 

(iv) The length j/4096 sec of the segment pair is the integration length and fc/4096 sec 
is the offset. 

(v) Let Ckj[p], p = 0, ±1, ±2, . . ., be the sequence, 

r 2 2 1 1/2 

Ck,j\p] = C k _j/ 2 ,k+p-j/2,j + C k _ p _j/2,k-j/2,j ■ (4) 

(j is always kept even.) 

^fcjM contains the autocorrelation of the GW signal for samples at and near 
p = 0, while far away samples only have the signal-noise cross-correlation. The 
far from zero time shift samples of Ckj [p] can be used to make an estimation of 
background mean and variance. This estimate uses detector noise that is localized 
near p = and is, thus, advantageous in the presence of non-stationarity. 

(vi) We divide the sequence Ck,j[p\ into two parts: 
-W/2 < p < W/2 core 

(5) 

2.5j < |p| < 2.5j + W side lobes 

where W is an even integer. For a core size W comparable to the effective duration of 
the signal autocorrelation function, a sum over core samples will have a comparable 
or larger signal to noise ratio than the sample at p = 0. In the analysis algorithm, the 
core size is max{20, j} < W < min{j, 40}. In practice, the analysis is found not to be 
sensitive to the exact choice of the core size around this point. 

The steps in the algorithm are as follows. First, we fix a rectangular grid in 
offset and integration length space. Let the list of integration lengths be / and that 
of offsets be O. Let the coordinates of the grid points be (fc,j), with 7[fc]/4096 sec 
and O[j]/4096 sec being the integration length and offset respectively. The spacing, 
(0[k + 1] — O[fc])/4096, between grid points along the offset direction is 1 msec. The 
distribution of the integration lengths are logarithmic. Each consecutive integration 
length is 50% longer than the previous one. 

(i) For each point (k,j), the sequence Coui n/.i[p] is computed over a range of p 
covering the lobes. 



Search algorithm for GWs from GRB030329 



6 



(ii) The mean fi^j and standard deviation a^j of Co hi, /[A;] [p] m the lobe regions is 
computed. 

(iii) The values of CquviWi [p] in the core region are standardized by subtracting fik.j 
and then dividing by (Tk,j- Positive standardized values in the core region are 
summed. Let the sum be -Xfcj • We call this two dimensional random field the 
corrgram and each pixel contains the "excess correlation" of close-to zero versus 
non-zero time shifts. (A standardization using unphysical time shifts was also 
used in [Sj.) 

(iv) For each j, 

(a) the largest 3% of values are dropped from the sequence Xf.j. 

(b) A Gamma distribution fit is found for the distribution of the uncorrelated 
portion of the remaining pixels. The mean /^ 7 [j] and variance er^[j] are 
computed and stored. 

(v) Each Xkj is standardized by subtracting ^ 7 [j] and dividing by a 1 [j] . 

The purpose of step|^is to bring all pixels of the corrgram on to the same footing. 



The purpose of step (iv)a is to prevent the biasing of the significance estimate by 
the presence of a strong signal or instrumental artifacts. 

(vi) A very low threshold of 0.05 is used to partition the standardized corrgram into 
pixels with values above and below the threshold. The pixels in the latter set are 
assigned the value zero. We continue to denote each pixel value as Xkj. 

(vii) A clustering algorithm is used to identify groups of adjacent pixels with non-zero 
values. The steps in the clustering algorithm are as follows: 

(a) Each pixel X^j is assigned a binary score, 

score 1 if X k ± 1J±1 ^ , (6) 
and score otherwise. For pixels at the borders of the corrgram, only the 
five nearest neighbours are required to be non-zero. 

(b) For all (fc, j) such that Xkj has score 1, set Xkj = if Xk.j < 0.5. We thus 
drop a group of pixels whose innermost pixel has a low value. This reduces 
the number of small clusters due to background noise. The likelihood of 
rejecting a genuine signal is small since, for the corresponding cluster of 
pixels, higher pixel values are most likely to occur inside the cluster and not 
on its border. 

(viii) Take A = maxfc j Xk.j and let A = X m<n (m, n thus being the location of the 
maximum) . . 

(ix) Let Gm,n = {Xk.j | n — bo < j < n + bo}, where bo is such that \0[n ± bo] — 
0[n] |/4096 = 0.064 sec, that is, G m . n is the set of all pixel values within 64 msec 
of the maximum. (Border pixels are handled as special cases.) Thus, we count 
pixels lying within 64 msec of each other as a single event. This choice is guided 
by Monte Carlo studies using a variety of waveforms. 

(x) The average of the 5 highest pixels in G m ^ n is computed. This quantity is 
called the strength of the event. The triplet of strength, integration length 
= /[to]/4096 sec and time of arrival = O[n]/4096 sec are stored. Each triplet 
constitutes an event. 

(xi) Set Xk.j — for n — bo < j < n + bo- Repeat step Iviiil until no more non-zero 
pixels are left. 
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(xii) Finally a list of events is obtained. Each event is a potential GW signal candidate. 

The list of events is used as the basis for statistical inferences which form a separate 
part of the analysis pipeline. In brief, two types of inferences are drawn, detection 
and, in the absence of detection, an upper limit on the root sum square of a possible 
gravitational wave signal. For the former, a detection threshold on the strength of 
each event, corresponding to a reliably known false alarm rate, is used. The threshold 
is estimated via Monte Carlo simulations using off-source data. The upper limit 
estimation is also performed via Monte Carlo simulations based on off-source data but 
this time with a variety of signal waveforms injected in software. The root sum square 
value of a signal that leads to, say, 90% detection probability for a given threshold is 
taken as the 90% confidence level upper limit on the root sum square of a possible 
GW signal. Details are provided in 

5. Conclusions 

We have presented a detailed description of the algorithm used in a search for 
gravitational waves associated with GRB030329 using data from the two Hanford 
LIGO interferometers. The search algorithm involves a scan over the all three free 
parameters involved in the cross-correlation based search: integration length, offset 
and relative time-shifts. This is followed by a systematic extraction of the information 
contained in the resulting three dimensional phase space in the form of an event list. 

The performance of the full analysis pipeline has been quantified using simulated 
data and the results are encouraging. As far as the search algorithm itself is concerned, 
more effort will be put towards the analytical characterization of its performance. We 
will also explore improvements in the computational performance of the code. This 
is required in order to reduce the time needed for the background studies and Monte 
Carlo simulations which dominates computational costs at the present. 
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